Skip to main content

About the Provider

Moonshot AI is a Chinese AI research company focused on building large-scale foundation models with advanced agentic and multimodal capabilities. Kimi K2 Thinking is their flagship open-weights reasoning model, the first open-source model to outperform leading closed-source models including GPT-5 and Claude 4.5 Sonnet across major benchmarks.

Model Quickstart

This section helps you quickly get started with the moonshotai/Kimi-K2-Thinking model on the Qubrid AI inferencing platform. To use this model, you need:
  • A valid Qubrid API key
  • Access to the Qubrid inference API
  • Basic knowledge of making API requests in your preferred language
Once authenticated with your API key, you can send inference requests to the moonshotai/Kimi-K2-Thinking model and receive responses based on your input prompts. Below are example placeholders showing how the model can be accessed using different programming environments.
You can choose the one that best fits your workflow.
from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
    base_url="https://platform.qubrid.com/v1",
    api_key="QUBRID_API_KEY",
)

# Create a streaming chat completion
stream = client.chat.completions.create(
    model="moonshotai/Kimi-K2-Thinking",
    messages=[
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms"
      }
    ],
    max_tokens=16384,
    temperature=1,
    top_p=0.95,
    stream=True
)

# If stream = False comment this out
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

# If stream = True comment this out
print(stream.choices[0].message.content)

Model Overview

Kimi K2 Thinking is the first open-weights model to achieve SOTA performance against leading closed-source models including GPT-5 and Claude 4.5 Sonnet across major benchmarks — HLE (44.9%), BrowseComp (60.2%), and SWE-Bench Verified (71.3%).
  • Built on a 1T parameter MoE architecture with 32B active per token and native INT4 quantization via QAT, it runs at 2x the speed of FP8 deployments.
  • The model maintains stable tool-use across 200–300 sequential calls within a 256K context window, with interleaved chain-of-thought and dynamic tool calling for complex agentic workflows.

Model at a Glance

FeatureDetails
Model IDmoonshotai/Kimi-K2-Thinking
ProviderMoonshot AI
ArchitectureSparse MoE Transformer — 1T total / 32B active per token, native INT4 via Quantization-Aware Training
Model Size1T Total / 32B Active
Parameters4
Context Length256K Tokens
Release Date2025
LicenseApache 2.0
Training DataLarge-scale multilingual dataset with RL post-training for agentic reasoning and tool-use

When to use?

You should consider using Kimi K2 Thinking if:
  • You need complex agentic research workflows with multi-step tool orchestration
  • Your application requires long-horizon coding and debugging
  • You are solving advanced mathematical reasoning tasks
  • Your use case involves autonomous writing and analysis
  • You need a model that outperforms GPT-5 and Claude 4.5 Sonnet on open benchmarks
  • Your workflow requires stable tool use across 200–300 sequential calls

Inference Parameters

Parameter NameTypeDefaultDescription
StreamingbooleantrueEnable streaming responses for real-time output.
Temperaturenumber1Recommended temperature is 1.0 for Kimi-K2-Thinking.
Max Tokensnumber16384Maximum number of tokens to generate.
Top Pnumber0.95Controls nucleus sampling.

Key Features

  • First Open-Source to Beat Closed Frontier Models: Achieves SOTA on HLE (44.9%), BrowseComp (60.2%), and SWE-Bench Verified (71.3%) — surpassing GPT-5 and Claude 4.5 Sonnet.
  • Native INT4 via QAT: Quantization-Aware Training enables INT4 inference at 2x the speed of FP8 without accuracy loss.
  • Stable Long-Horizon Tool Use: Maintains consistent tool-calling behaviour across 200–300 sequential calls within a single context window.
  • Interleaved Chain-of-Thought: Dynamically interleaves reasoning traces with tool calls for interpretable agentic execution.
  • 1T MoE Architecture: Frontier-scale capacity with only 32B parameters active per token for efficient inference.
  • 256K Context Window: Supports long-horizon document analysis, multi-turn agentic tasks, and extended reasoning chains.

Summary

Kimi K2 Thinking is Moonshot AI’s flagship open-weights reasoning model and the first to surpass closed frontier models at open-source scale.
  • It uses a 1T MoE Transformer with 32B active parameters and native INT4 via QAT, running at 2x the speed of FP8 deployments.
  • It achieves SOTA on HLE (44.9%), BrowseComp (60.2%), and SWE-Bench Verified (71.3%), outperforming GPT-5 and Claude 4.5 Sonnet.
  • The model supports 256K context, stable 200–300 sequential tool calls, and interleaved chain-of-thought reasoning.
  • Licensed under Apache 2.0 for full commercial use.